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Figure  5:  Lowpass-Highpass  Decomposition  of  Identity 
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Figure  7:  Flow  of  ideas  in  Parametric  Spectrum  Analysis 
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Figure  10:  Sequence  of  Experiments  for  Spectrum  Analysis 
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ABSTRACT 

Among  nouparametric  estimators  of  the  power  spectrum,  quadratic  estimators  are  the  only  ones  that 
are  dimensionally  correct.  In  this  chapter  we  motivate  interest  in  quadratic  estimators  by  setting  up  ideal¬ 
ized  experiments  for  spectrum  analysis  and  deriving  maximum  likelihood  estimates  for  narrowband  power. 
These  idealized  experiments  show  that  quadratic  functions  of  the  experimental  data  are  sulficients  statistics 
for  estimating  power.  The  ma.\imum  likelihood  estimates  show  that  low-rank  projection  operators  play  a 
fundamental  role  in  the  theory  of  spectrum  analysis.  The  projection  operator  plays  the  same  role  as  a  band¬ 
pass  filter  in  a  conventional  swept  firequency  spectrum  analyzer.  The  mean-squared  error  of  a  maximum 
likelihood  estimator  of  power  is  inversely  proportional  to  the  rank  of  the  estimator. 

With  our  maximum  likelihood  results  in  hand,  we  turn  to  a  systematic  study  of  quadratic  estimators 
of  the  power  spectrum.  \Ve  prove  a  fundamental  representation  theorem  for  any  quadratic  estimator  of 
the  power  spectrum  that  is  required  to  be  positive  and  modulation-invariant.  The  resulting  estimator  has 
a  multiwindow  interpretation.  The  quadratic  estimator  may  be  tailored  to  produce  a  number  of  classical 
estimators,  including  those  of  Schuster,  Daniell,  Blackman  and  Tukey,  Grenander  and  Rosenblatt,  Clergeot, 
and  Thomson.  In  one  of  its  forms,  the  quadratic  estimator  is  the  maximum  likelihood  estimator  for  the 
power  in  a  narrow  spectral  band.  In  this  form,  the  estimator  projects  data  onto  a  low-rank  subspace  where 
the  power  in  the  subspace  is  the  estimate  of  the  power  in  a  spectral  band.  We  complete  our  study  of 
quadratic  estimators  by  bounding  their  mean-squared  error.  The  results  corroborate  the  bounds  obtained 
from  the  idealized  expiriments  and  bring  a  wealth  of  insight  into  the  trade-offbetwei'ii  model  bias  (or  spectral 
resolution)  and  estimator  variance 

We  tie  up  our  results  by  illustrating  a  number  of  eqnnalent  implementations  lor  qii.ulratic  forms  m 
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iow-rank  projection  operators. 


1.0  INTRODUCTION 


In  the  theor_\'  o!'  ^vi.ie-sense  -t at ioiinry  time  .'erie.-i.  there  s  tio  eoiihision  ;i.eiir  the  ef  the 

term  power  spectrum.  But  what  does  it  mean  to  estimate  one.  and  what  are  the  moti\aiions  lor  doine  <o': 
For  Schuster  [1]  the  motivation  was  to  find  hidden  periodicities  in  meterolonical  time  senes,  and  therefore 
it  was  natural  for  him  to  form  a  "periodogram "  that  would  peak  when  periodic  components  nf  the  time 
series  matched  the  period  of  his  analyzer.  For  L'instein  i'J].  the  moti\ation  was  to  determine  the  variance 
of  a  Fourier  series  coefficient  in  a  ['eriodic  expansion  of  a  random,  waveform.  Finstein’s  variance  expression 
was.  in  fact,  the  power  spectrum,  lie  showed  that  the  power  spectrum  was  the  Fourier  transform  of  the 
correlation  function  for  the  random  waveform.  This  finding,  refined  and  extended  in  Wiener's  work  on 
generalized  Fourier  analysis  [3],  led  to  the  generally  held  view  that  spectrum  analysis  was  a  problem  of 
correlation  analysis.  This  was  the  point  of  view  adopted  by  Blackman  and  Tukey  [  l]. 

The  Schuster  and  Einstein  views  are  actually  compatible.  The  periodogram  and  the  Fourier  transform  of 
the  estimated  correlation  sequence  produce  identical  estimates  of  the  power  spectrum,  provided  the  so-called 
biased  estimate  of  the  correlation  sequence  is  used.  The  mean  of  the  periodogram  is  a  Bartlett-windowed 
version  of  the  true  spectrum,  and  it  converges  to  the  true  spectrum  as  the  length  of  the  data  window  increases 
without  bound.  The  variance  of  the  periodogram  does  not  converge  to  zero,  meaning  the  periodogram  is  an 
inconsistent  estimator  of  the  power  spectrum.  The  basic  problem  with  the  periodogram  is  that  it  generates 
roughly  independent  point  estimates  of  the  power  spectrum  at  the  same  rale  that  it  acquires  new  data, 
leaving  no  room  for  statistical  averaging.  This  problem  was  understood  by  Daniel!,  who  proposed  the  use  of 
frequency  averaging  to  stabilize  the  variance  of  an  estimator  of  the  power  spectrum  [.5].  The  closely  related 
ideas  of  segmenting,  windowing,  and  averaging,  as  advocated  by  Welch  [6],  are  simply  variations  on  Daniell’s 
theme.  All  of  these  variations  produce  estimates  of  the  power  spectrum  that  are  quadratic  in  the  data. 

In  the  sections  that  follow,  we  develop  a  theory  for  quadratic  estimators  of  the  power  spectrum.  We 
begin  in  Section  2.0  with  a  heuristic  discussion  of  the  aims  and  approaches  of  spectrum  analysis.  We  set  up 
an  idealized  experiment  and  ilhi.->trate  the  role  played  by  bandpa.ss  filters  that  concentrate  power  in  a  narrow 
spectral  band.  We  generalize  this  idea  to  linear  transformat ion.s  that  concentrate  power  and  discover  Slepian 
sequences  [7]  ...s  the  appropriate  basis  for  building  bandpass  snbspaces.  In  Section  3.0  we  use  this  idealized 


''xpt^rimetit  lo  ileri\e  ina.xiiiiuni  likclihooil  I'stiniatcs  for  ilie  power  in  a  narrow  speeiral  haiul  I'lie  e.'iiin.i!ri 
are  ijuadratic  forms  in  tiie  experimental  data.  W’e  analyze  the  mean  and  \ariance  of  the  estimators  and 
sliow  that  tlie  mean-squared  error  of  a  reduced  rank  estimator  depends  inversely  on  the  rank.  We  then  show 
that  rank  retluction  is  a  necessity  in  any  practical  approacli  to  power  estimation.  By  introducing  the  idea  ol 
.'omple.x  modulation,  we  derive  Thomson's  [S]  family  of  low-rank,  swept-freqiiency  qiiadintie  esilmators  e! 
the  power  spectrum. 

Quadratic  estimators  arise  as  sufficient  statistics  in  an  idealized  experiment  for  measuring  power  in 
a  narrow  spectral  band.  In  Section  4.0  we  prove  a  representation  theorem  for  quadratic  estimators  oi 
the  power  spectrum  that  are  non-negative  and  modulation-invariant.  In  our  view,  these  are  minimum 
requirements  for  any  estimator  of  the  power  spectrum,  quadratic  or  not.  We  show  that  the  resulting  class 
of  quadratic  estimators  scales  correctly  when  the  data  is  scaled  and  preserves  the  llermiiian  symmetry  oi 
the  power  spectrum  under  time  reversal.  We  compute  the  mean  and  variance  of  any  quadratic  estimator 
of  the  power  spectrum  and  derive  a  simple  bound  on  the  mean-squared  error  of  a  reduced  rank  ijuadratic 
estimator.  The  result  brings  insight  into  the  iradeolf  between  resolution  and  variance  and  produci  s  rosuits 
much  like  the  multi-window  spectrum  estimators  of  Thomson  [8],  Our  quadratic  estimators  subsume  all  ol 
the  classical  windowed  and  smoothed  estimators  commonly  considered  for  spectrum  estimation,  including 
those  of  Schuster  [1],  Daniell  [-a],  Welch  [6],  Clergeot  [9],  and  many  others.  In  one  of  its  more  illuminating 
forms,  the  quadratic  estimator  matches  the  maximum  likelihood  estimator  for  estimating  the  power  in  ,a 
narrow  spectral  band.  In  this  form  the  estimator  projects  data  onto  a  low-rank  subspace  and  uses  the  power 
in  the  subspace  cis  an  estimate  of  power  in  the  narrow  spectral  band.  The  projection  operator  is  constructed 
from  Slepian  sequences.  The  net  result  is  that  we  have  at  least  three  interpretations  of  Thomson's  multi¬ 
window  spectrum  estimator,  each  of  which  brings  its  own  insights:  (i)  it  is  the  maximum  likelihood  estimator 
of  the  power  in  a  narrow  spectral  band;  (ii)  it  is  a  projection  onto  a  low  rank  subspace,  followed  by  a  power 
computation;  and  (iii)  it  is  a  reduced  rank,  frequency  averaged  periodogram  (a  reduced  rank  version  of 
Darnell's  smoothed  periodogram). 

We  tie  lip  our  results  for  quadratic  estimators  of  the  power  spectrum  by  illustrating  a  number  of 
equivalent  im|>lementations.  I'liese  implementations  use  projection  oiierators,  or  their  clo.se  (oiLSiiis,  Slepiaii 
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windows,  and  discrete-time  Fourier  transforms.  All  ofonr  re.suh.s  may  he  extcndeil  in  a  .si rni',;lit for w.trd  w.iy 
to  the  estimation  of  power  in  nonuniformly  sampled  time  series  '10],  [11], 


2.0  HEURISTIC  BEGINNINGS 


Spectniiii  estimation  may  be  divided  into  two  broad  categories:  (i)  estimation  of  [mwer  in  a  narrow 
spectral  band,  and  (ii)  estimation  of  a  parametric  spectrum  model.  The  first  category  is  often  called  classical 
spectrum  analysis,  and  the  second  is  often  called  modern  spectrum  analysis.  One  of  our  objectives  in  this 
chapter  is  to  show  that  there  are  plenty  of  modern  ideas  in  classical  spectrum  analysis. 

Category  (i)  dominated  the  early  work  on  spectrum  estimation,  beginning  with  the  original  work  of 
Schuster  [1]  and  proceeding  to  the  current  work  on  windowed  and  smoothed  periodograms.  In  our  view,  this 
category  represent'  the  view  of  spectrum  analysis  that  most  closely  captures  the  essence  of  the  problem, 
namely. 

"from  a  finite  record  of  a  wide-sense  stationary  time  series,  estimate  how  the  total  power  is  distributed 

over  narrow  spectral  bands.” 

The  essential  problem  is  to  sweep  a  narrow-band  spectrum  analyzer  through  the  Nyquist  band  in  such  a 
way  that  the  band  is  highly  resolved  and  the  estimates  of  power  in  narrow  spectral  bands  have  low  variance. 
Thomson’s  1982  paper  [8]  is  the  definitive  work  on  multi-window  spectrum  estimation.  It  has  brought  new 
insight  into  this  problem  and  stimulated  renewed  interest  in  classical  spectrum  estimation. 

Category  (ii)  has  dominated  the  engineering  literature  since  the  publication  of  Burg’s  1967  paper  on 
maximum  entropy  and  autoregressive  models  [12].  The  key  idea  is  to  assume  that  the  power  spectrum 
Sxiie^^)  belongs  to  some  parametric  class  such  as  the  autoregressive  class  and  then  to  identify  the  parameters 
of  the  model  from  a  record  of  measurements.  Not  surprisingly,  there  are  connections  between  categories  (i) 
and  (ii).  The  first  was  established  by  Parzen  [14]  in  his  original  advocacy  of  autoregressive  (AR)  spectrum 
analysis  as  a  method  to  smooth  rough  periodograms.  Parzen  was  also  the  first  to  recognize  the  importance  of 
order  selection,  which  is  akin  to  rank  reduction,  for  the  control  of  approximation  error  in  spectrum  analysis. 
In  Section  3.0  of  this  chapter,  we  shall  establish  a  second  connection.  We  shall  shosv  that  the  problem  of 
estimating  power  in  a  narrow  spectral  band  may  be  rephrased  as  a  problem  of  estimating  parameters  in  a 
structured  covariance  matrix.  This  point  of  view  has  previously  been  exploited  indirectly  by  Burg  et  al  [13] 
and  more  directly  in  [1G]-[18].  The  resulting  estimator  of  power  in  a  narrow  spectral  band  is  a  projection- 
based  spectrum  analyzer  that  may  be  interpreted  as  a  reduced  rank,  frequency  averaged  periodogram. 
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2.1  The  Power  Spectriun  and  Quadratic  Estimators 

The  variance,  or  power,  of  a  wide-sen.se  stationary  (WSS)  time  series  {j-,}  may  be  written  as  tlic  integral 

IT 

'•0  =  j  (1) 

—  W 

wliere  Sti(€^^)  is  the  power  spectral  density  of  the  time  series  and  Sii(e^'^)  ^  is  the  infinitesimal  power  in 
the  band  {o— 4f<0<<i-f4^}.  This  infinitesimal  power  is  just  one  variance  component  in  an  infinite  set  of 
infinitely  resolved  variance  components.  It  is  surely  unreasonable  to  estimate  such  an  infinite  set  of  variance 
components  unless,  of  course,  the  set  is  smoothed  or  finitely  parameterized.  This  suggests  two  approaches, 
corresponding  to  the  classical  and  modern  theories  of  spectrum  analysis. 

The  first  appioach  is  to  replace  the  infinitely  resolved  variance  components  ^  by  the  finitely 

resolved  (or  smoothed)  variance  components 

ro(ti)=  J  (2) 

B(^) 

The  band  B{<^)  is  typically  a  passband,  and  ro(<^)  is  the  power  in  the  band.  This  formulation  shows  spectrum 
analysis  to  be  a  problem  in  the  analysis  of  variance  (AXOV'A),  a  problem  where  the  estimation  of  infinitesimal 
variance  components  is  replaced  by  the  estimation  of  finite  variance  components. 

■  The  second  approach  is  to  use  an  understanding  of  the  underlying  physics  to  parametrically  describe  the 
spectrum,  and  use  a  statistical  theory  to  identify  the  parameters.  This  formulation  leads  to  the  estimation 
of  autoregressive  (AR),  moving  average  (MA),  and  autoregressive  moving  average  (ARMA)  power  spectra. 
The  resulting  theory  is  closer  to  the  theory  of  rational  model  identification  (from  output  data)  than  it  is  to 
the  theory  of  spectrum  analysis. 

If  we  choose  our  parametric  model  of  the  spectrum  to  model  power  in  a  narrow  spectral  band,  we  can 
bridge  the  gap  between  these  two  approaches. 

2.2  Ideal  Filters 

When  we  represent  the  power  of  a  time  series  by  its  variance  components,  we  interpret  fo(d)  to  be  "that 
part  of  the  total  power  that  lies  in  band  D{o)."  But  this  is  evidently  “the  power  in  that  component  of  the 
time  scries  that  lies  in  band  D{d)."  How  can  we  estimate  this  power?  If  we  had  an  infinite  record  of  data  and 
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infinite  computing  resources,  we  could  pass  the  data  (or  signal)  through  the  ideal  bandpass  filter  illustrated 
in  Figure  1.  The  frequency  response  of  the  ideal  filter  ///?tr)  is 


//file'")  = 


1.  9^B{o) 
Q,  9  i  5(0), 


where 


The  filtered  signal 


5(0)= 


{ya  =  //B(-'){x,} 


(3) 


(1) 


(5) 


is  surely  what  we  mean  by  tlie  phrase  “that  part  of  the  signal  {x,}  that  lies  in  band  5(0)."  The  variance  of 
yt  is  the  power  in  the  band: 


ro(0)  =  riy,!'  =  J  Srz(e^^)  ^ 

Bio) 

=  £  i- 


(6) 


B(«) 


This  result  shows  that  the  power  in  the  band  depends  on  the  entire  covariance  sequence  for  the  time 

series  {xt}.  If  one  adopts  the  point  of  view  that  spectrum  analysis  is  a  problem  in  correlation  analysis,  then 
one  is  forced  to  estimate  an  infinite  number  of  correlations  or  to  impose  a  parametric  model  for  extending 
them  outside  some  range  of  indexes  for  which  they  can  be  estimated. 

Under  an  appropriate  ergodic  assumption,  the  estimator 

M 


2M  +  1  £ 


|ytl 


is  a  consistent  estimator  of  the  narrow-band  power  ro(d).  This  estimator  may  be  written  as  the  quadratic 
form 

oo 

(f^) 


ro(<?)  =  Z  •fn  i 

n,m=:  —  co 


where 


1 


•ni  + 1 


^  —  — m- 


(9) 


f=  -  M 


In  the  limit  as  M  —  oc,  the  doubly  indexed  sequence  qnm  depends  only  on  u  —  in.  and  the  quadratic  form 
is  Toeplitz  Grenander  and  Rosenblatt  call  such  a  Toeplitz  quadratic  form  a  spectrogram  [19], 
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2.3  Complex  Demodulation  and  Froqticncy  Sweeping 

^Vhcn  the  time  series  {rj}  is  real,  as  we  shall  assume,  then  the  power  spectral  density  Srrif^^) 
symmetric  about  0  =  0.  The  power  in  the  narrow  spectral  band  B(d)  is  twice  the  power  in  the  upper 
sideband  B+{o)  illustrated  in  Figure  2(a),  But  the  power  in  this  sideband  is  just  the  power  in  the  complex 
demodulated  signal  that  lies  in  the  baseband  D  =  {O  :  —3ir  <  9  <  3^]-  This  property  follows 

from  the  fact  that  is  the  power  spectral  density  of  the  demodulated  signal,  and  the  power  in 

the  upper  sideband  may  be  written  as 

o+Jt 

1  /■  ,  do 

2'-q(0)-  J  brAe  )2^ 

(10) 

=  J  5.x(€'^*+®^) 

— 

See  Figure  2(b)  for  an  illustration.  This  finding  permits  us  to  replace  the  idealized  experiment  of  Figure  1 
by  the  idealized  experiment  of  Figure  3.  In  this  experiment,  the  time  series  {X(}  is  complex  demodulated  in 
order  to  sweep  the  upper  sideband  down  to  baseband  where  it  is  filtered  by  the  ideal  baseband  filter  H(z). 
The  estimator 

is  a  consistent  estimator  of  the  power  in  upper  sideband  fl+(d)  and  of  one-half  the  power  in  passband  D(3). 


2.4  FIR  Filters 

The  basic  idea  embodied  in  the  alternative  experiment  of  Figure  3  may  be  turned  into  a  practical 
alternative  if  we  replace  the  ideal  baseband  filter  //b(~)  "’ith  a  causal,  finite-dimensional  FIR  filter  //(:): 


;V-1 


where 


-  n  --2 


>l/(-')  =  [1  r 


//(--)  =  ^ 

n  =  0 

=  h^'K--). 


(12) 


;i3) 


If  the  filter  //(:)  is  to  have  properties  analogous  to  iliose  of  the  ideal  bandpass  filter,  then  it  should  be 
as  frequency  selective  as  we  ran  make  it.  One  way  to  measure  frequency  selectivity  is  to  pass  while  noise 
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ihrougli  the  filter  and  measure  the  output  variance; 


f  I  ,,,  }i\\-  1  Ti 

To  =  /  //{e-'  )  —  =  a  h. 

J  2  T 


The  part  of  this  power  that  lies  in  baseband  D  is 


If  we  use  the  representation  H{:)  =  h^'I'(r),  we  may  write  this  power  as 


ro(S)  = 

i  (IG) 

= 

where  the  Toeplitz  matrix  R  depends  only  on  the  baseband  B: 

R  =  J  =  {r^_^]  (17) 

B 

0ir 

frn-a  =  J  ^  =  dsinc[i5T(7n  -  n)] . 

-0^ 

We  would  like  to  make  //(;)  as  frequency  selective  as  we  can  by  maximizing  ro(5),  under  the  constraint 
that  To  =  h^h  =  1.  This  leads  to  the  following  problem: 

max  h^fZii  subject  to  the  constraint  h^h  =  1.  (18) 

h 

The  solution  is  to  make  li  the  dominant  eigenvector  ui,  corresponding  to  the  maximum  eigenvalue  Aj,  of  R: 


R  =  UA-U'^  {RU^UA) 


b'  =  [uiuo  -u.v]  A  =  diag(AfA;  •  •  Afv)  A];  >  A;  >  •  ■  •  >  A^v 

The  eigenvectors  of  R  are  the  Slepian  sequences  featured  so  prominently  in  Thomson's  paper  [8]. 
In  summary,  the  most  frequency  selective  FIR  filter  h  for  band  B  is  the  filter 


//(;)  =  h^'I'(c),  (20) 

where  li  =  iii  is  the  doniinant  Slepian  sequence  for  baseband  B.  \\’hen  {i/,}  is  the  output  of  this  FIR  filler. 


the  consistent  estimator 


=  ItTTT  £ 
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is  a  I'oeplitz  quadratic  form  as  long  as  M  >  .V. 


2.5  Linear  Transformations 

Tliese  results  may  be  generalized  by  considering  a  problem  that  is  closer  yet  in  spirit  to  practical 
estimation  of  power  in  a  spectral  band.  The  idea  is  to  replace  the  FIR  filter  l!{z)  by  a  linear  transformation 
II  that  maps  an  .V-dimensional  record  of  the  modulated  time  series  into  an  m-dimensional  record 


of  transformed  measurements: 


where 


y  = 


y  =  L'^o  yi  ■  ■  X  =  [ro  -Cl  •  •  •  £)(e^'^)  =  diag[l  e 


JO  .  .  . 


The  frequency  selectivity  of  the  linear  transformation  II  is  measured  by  passing  white  noise  x  through  it 

and  measuring  the  average  output  variance  over  the  m  output  components: 

ro  =  -  Elly]!-  =  -Elr[//xx^//^] 
m  m 


=  ^tr[//  J  ^//^]. 


The  part  of  this  variance  (or  power)  that  “resides  in  band  B"  is 


ro(i?)  =  -tr[//  /  — 

m  y-  J  27r 


=  — tr[//R//^], 
m 

where  R  is  the  matrix  defined  in  Section  2.4. 

If  the  linear  transformation  is  to  have  properties  like  those  of  the  linear  filter  //(;),  then  w'e  would  like 
it  to  maximize  ro(i?),  under  the  constraint  that  each  row  of  H  liave  unit  norm: 

max  tr[///?//^]  subject  to  the  constraints  cjcn  =  1  (n  =  1,2, .  . .  ,  rrr).  (27) 


The  problem  is  to  find  the  saddle  points  of  the  Lagrangian 


£  =  ^  vJ,Rc.n  -  ^ //„(<•., -  1). 
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(29) 


Nt'Ci'ssarv  conditions  are 


HCfi  — 


for  each  n.  Tliis  makes  c„  the  n'^  dominant  eigenvector  of  i?  and  fJn  the  corresponding  eigenvalue  A'.  Tlie 


resulting  power  in  D  is 


1  1 

ro(S)  =  -tT[IIRH'^]=  - 

in  1-n  *  ^ 


The  linear  transformation  H  may  be  written  as  a  matrix  of  Slepian  sequences: 


2.G  An  Illiiininating  Example 

Let's  try  to  illustrate  the  frequency  selectivity  of  the  linear  transformation  //.  The  variance  tq  may  he 


rewritten  as 


ir 

ro  =  ^tr[//  J 

—  » 


L''„(e''*)|'  =  :  frequency  response  of  the  n'*’  Slepian  sequence. 


The  component  of  this  variance  that  lies  in  band  B  is 


So,  the  frequency  .selectivity  is  measured  by  the  average  of  the  frequency  selectivities  for  the  rn  eigenvectors 


Let's  choose  X  =  01  and  B  =  {0  :  — <  0  <  .?t,  3  =  1/10}.  The  restilting  time  bandwidth  [iroduct 
IS  XJ  =  1.  The  number  of  dominant  eigenvalues  of  R  is  '1  (the  time-bandwidth  product)  The  fre(piency 
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!•  'ponso  of  >o\''r.il  dominant  and  subdommant  Slopian  sequences  is  illnslratod  in  Tiguro  1.  Tin'  fri'iiiioiu') 
response  of  the  first  four  eigenvectors,  namely 


(3!) 

n  =  1 


i>  j'loarj  ill  Figure  5.  logeihor  with  the  frequency  response  of  the  remnining  sixty  eigenvectors,  namely 

64 

£  |f'n(e'')F  (331 

n  =  5 

These  results  show  that  the  dominant  eigenvectors  concentrate  power  on  the  band  D.  The  subdominant  ones 
do  little  in  the  way  of  concentration.  In  fact,  their  frequency  selectivity  is  concentrated  out-of-band.  The 
results  of  Figure  -5  may  also  be  interpreted  as  follows.  The  frequency  response  of  an  identity  transformation 
is 

.V 

±1  (30 


where  P  is  the  rank-m  projection  built  from  the  dominant  Slepian  sequences.  The  frequency  response  of  the 
projection  is  just  the  frequency  response  of  the  dominant  eigenvectors: 


m 

(37) 

n=  1 

where 

m 

P  =  Y.  (3S) 

n  =  l 

It  is  clear  that  the  linear  transformation  11  characterizes  the  projection  P,  whose  range  is  a  low-rank  subspace 
where  most  of  the  power  in  the  band  B  is  concentrated.  This  is  illustrated  in  Figure  6. 


2.7  Maximum  Entropy  Interpretation 

There  is  another  way  to  describe  tlie  m  x  .V  linear  transformation  11  derived  in  Section  2.5.  It  is  a  linear 
transformation  that  produces  a  maximum  entropy  version  of  white  noise,  under  the  constraint  that  the  rows 
of  //  are  normalized.  The  resulting  entropy  of  y  =  IIx  when  x  is  uhife  is 

S  =  -  In  [(2Tc)-"'/-(det////^)-'/-] 

m  ,  f3'3> 

=  —  ln(2Tc). 
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The  raiuloni  vector  y  =  //x  is  also  as  white  ais  it  can  be,  with  covariance 


=  Eyy^  =  E  =  /. 


(10) 


and 


its  prediction  error  is  as  large  as  it  can  be. 

The  norm-squared  of  y  may  be  written  as  a  quadratic  form  in  the  projection  operator  P: 

y^y  = 


=  x^Px  =  ||Px||-. 


(-11) 


The  linear  transformation  //  that  maximizes  entropy  also  characterizes  the  m-dimensiona!  subspace  (//^) 
where  the  largest  fraction  of  the  power  in  band  B  lies  (and  vice  versa).  This  power  is  approximately  the 
integral  of  the  tti-dominant  frequency  responses  |(./„(e-’*’)|‘  over  the  band  B: 

Ey^y  =  tr[P] 


n=l  nsl 


(•12) 


2n’ 


U 


3.0  MAXIMUM  LIKELIHOOD  ESTIMATION  OF  POWER 

We  shall  begin  our  study  of  ma.vimum  likelihood  theory,  and  its  application  to  .spectrum  analysis, 
by  returning  to  the  experimental  setup  illustrated  in  Figure  1.  However,  we  n'^w  ask  what  hapjtens  when 
only  the  finite  snapshot  y  =  [vo  Vi  ■  ■  ■  is  available  to  the  experimenter. 

Let  us  assume  that  the  time  series  (j,}  is  Gaussian  and  widc-sense  stationary  (\N'SS).  with  mean 
zero,  covariance  sequence  {e,}.  atid  power  spectrum  The  snapshot  y  is  then  a  normal  random 

vector  Its  tnean  is  zero  and  its  covariance  matrix  Ryy  is  related  to  the  power  spectrum  of  {j(}  as  follows: 

/f,,  -£-yy^=  y  (-Id) 

BiO) 

The  vector  'I'(e-’^)  is  called  a  Fourier  vector;  when  evaluated  at  ^  =  2-jm/.\.  m  =  0, 1 . V  —  1.  it  is  called 

a  DFT  vector. 

Our  shorthand  tiotation  for  describing  the  snapshot  y  is 


y  :  .V[0.  Ryy]. 


(■U) 


It  is  clear  that  the  information  about  the  power  in  band  Dio)  can  be  carried  only  in  the  covariance  structure 
of  y.  The  covariance  matrix  Ryy  is  Toeplitz,  meaning  that  the  power  in  band  D{6)  is  related  to  Ryy  as 
follows: 


^o(o)  = 


I 


2t 


(lo) 


The  problem  of  estimating  the  power  ro(o)  in  the  band  B(<^)  is  a  problem  in  estimating  the  trace  of  Ryy. 
In  maximum  likelihood  theory,  the  invariance  theorem  shows  that  the  problem  of  estimating  Ryy  is  a 
problem  of  estimating  Ryy  and  then  finding  its  trace.  This  brings  us  to  a  discussion  of  maximum  likelihood 
estimation  of  covariance  matrices.  In  the  discussions  to  follow,  we  replace  ro{<i>)  with  the  notation  tq  in  order 
to  simplify  notation.  But  tq  is  always  ro(o),  and  estimates  ro  are  always  estimates  ro(c>). 


3.1  Maximum  Likelihood  Estimation  of  the  Covariance  Matrix 

The  flow  of  idea.s  for  parametric  spectrum  analysis  is  illustrated  in  Figure  7.  In  the  figure,  Syy(<'^^) 
and  r,  denote  the  .spectrum  and  correlation  .secp.ience  of  {;/(}  If  the  measurements  are  drawn  from  this  wide- 


sense  stationary  time  series,  then  th'’  covariance  matrix  for  the  measurements  m.w  be  constructed  from  the 


covariance  sequence.  The  covariance  matrix  is  denoted  Ryy  If  tlie  measurement  sequence  y  is  multivariate 
normal,  denoted  y  :  .V  [O. ,  then  the  covariance  matrix  Ryy  may  be  estimated  from  a  criterion  like 
ma.ximum  likelihood.  The  estimated  covariance  matrix  is  then  used  to  estimate  the  spectrum.  Generally,  the 
covariance  matrix  Ryy  has  some  special  structure,  with  unknown  parameters.  Procedures  such  as  maximum 
likelihood  for  identifying  the  parameters  are  generally  very  complicated  because  the  parameters  are  imbedded 
in  the  determinant  and  inverse  of  a  covariance  matrix.  However,  if  the  time  series  is  autoregressive,  linear 
prediction  approximately  solves  the  maximum  likelihood  problem  [13], 

In  the  theory  of  ma.ximum  likelihood  (ML),  information  about  the  covariance  matrix  is  carried  in 
the  log  likelihood  function  L{Ryy\y): 


L{Ryy-y)  =  ln(2T)  -  ^  In  det  Ryy  -  RyyV  (16) 

The  structured  covariance  matrix  Ryy  we  denote  by  Ryy{Q^,  where  ^  is  a  set  of  parameters  ^  =  {OiO-2  ■  ■  ■  Op). 
The  maximum  likelihood  equation  for  determining  the  maximum  likelihood  estimate  Ryy  =  Ryy{^  is  [15] 

tr[/?,-;(«vy-yy’')^]  =0;  (U  =  1.2 . r).  fl7) 

The  Fisher  information  matrix  is 


J  —  {J mn  } 


J mn  —  .A  I 


-1  dRyy  1  dRy, 

2“L“!'!'  de^  de„ 


and  the  Cramer-Rao  bound  on  any  unbiased  estimator  of  £  is 


(-18) 


M  =  E{i- 0)'^  >  J-y 


(49) 


Any  estimator  that  achieves  the  bound  is  efficient  and  minimum  variance  unbiased  (M\'UB). 

In  the  following  section  we  show  how  to  apply  these  general  results  to  the  problem  of  estimating 
power  in  a  narrow  spectral  band. 


3.2  Unstructured  Covariance 

When  no  structural  information  about  the  covariance  matrix  Ryy  is  known,  then  the  .ML  estimate 


of  R.jy  IS 


Ryy  =  yy^. 


(50) 
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and  the  corresponding  ML  estimate  of  the  power  in  band  B(d)  is 

1 


^0  =  -y  1 

^  T 

=  jy  y. 


Tlie  mean  and  variance  of  the  estimator  are 


Ero  =  ^tr[£yy^]  =  jjlrlRyy] 


B(<i) 


(51) 


(52) 


varro  =  (53) 

This  is  a  very  primitive  estimator  of  the  power  that  exploits  none  of  the  prior  information  that  we  have 
about  Ryy.  For  example,  we  know  that  Ryy  is  Toeplitz,  and  this  information  can  be  used  [15].  But  more 
than  this,  we  know  that  Ryy  should  have  a  special  spectral  representation  that  it  inherits  from  the  ideal 
bandpass  filter  This  is  the  idea  that  we  exploit  in  the  next  section. 


3.3  Structured  Covariance 

The  real,  symmetric,  non-negative  definite  covariance  matrix  Ryy  has  the  spectral,  or  orthogonal, 
representation 

Ryy  =  U'^U  =  I  (54) 

(' =  [uiuo  ■  ■  ■  u,v];  D"  =  diag((T^(T;  •  •  •  cr^).  (55) 

We  shall  assume  that  U  is  a  known  orthogonal  matrix  and  is  an  unknown  diagonal  matrix.  We  shall 
represent  the  diagonal  matrix  Z~  as 


Z^  =  roA-;  A2  =  diag(AjA^-A^) 


nsl 


(56) 

(57) 


With  this  representation  of  Ryy,  the  unknown  parameters  are  t-Q,  the  power  in  the  band,  and  the  nuisance 
parameters  {A^}‘[^.  The  normalization  of  the  parameters  {A*}^  preserves  the  connection  between  tq  and 


.V 


Co 


-^tr[/fjJ  -  Co  ^tr[.\-] 


=  Co. 


(58) 
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The  log-likelihood  function  takes  the  very  special  form 


jy  N 

in(2T)-  i  In  roA^  -  -^yl^nyn- 


In  this  likelihood  formula,  P„  is  a  rank  one  projection  onto  the  subspace  spanned  by  the  eigenvector  u„ 


and  yJPnyn  is  one  of  N  sufficient  statistics: 


VnPnVn  ■  sufficient 


P„  =  u„u„ . 


This  finding  is  important  because  it  shows  that,  whether  or  not  we  use  a  ML  theory,  the  quadratic  forms 
y^Priyn  ^re  sufficient  statistics  for  the  parameters  Tq, 

Let’s  maximize  likelihood  under  the  constraint  that  the  {A‘}'^  average  to  one.  The  appropriate 


Lagrangian  is 


£  =  -  5;  ln(roA2)  -  ^  ;^yn^ny  "  E  "  l)  ' 


The  gradients  of  C  with  respect  to  tq  and  may  be  equated  to  zero  to  produce  the  ML  equations 

SjC  I  \  ^  1  7’ 

—  =  Q-  _ yTp  y  ^  J--  IL  -  0 

dXl  (ro^ir-  \l  N 

1  1  II 

•  —  — v^P  V-  1  -  -^A^  -  0 
■  ro  A2  ^  yv  " 

The  constraint  may  be  invoked  in  the  second  of  these  equations  to  show  that  the  constraint  plays  no  role  in 


the  minimization: 


n  =  l  "  n  =  l 


,,  =  0, 


The  maximum  likelihood  estimates  of  tq  and  A^  are 


n=:l 


K  =  -y^Pny. 
^0 
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Tliere  are  actually  two  important  results  here.  First,  if  the  are  known,  then  the  ML  estimate  of  vq  is 


1  ^  1  T  p 

'•0  = 

'  n=t  '• 


(C6) 


Second,  if  the  {A*  }i  are  unknown,  then  the  normalizing  equation  may  be  used  to  rewrite  ro  as 

N  ,  ,  N 


y^PnV 


(67) 


n=:l 


n=l 


1  T  ^  T 

ro  =  —  2^y  Pny=  y. 

n=  1 

These  two  findings,  illustrated  in  Figure  8,  show  that  the  ML  estimate  of  the  power  in  band  5(ci)  is  quadratic 
in  the  data. 


3.4  Mean  and  Variance 

The  mean  and  variance  of  our  two  estimators  are  computed  from  the  mean  and  variance  of  the 
quadratic,  sufficient,  statistics  y^PnY-  The  rank-one  projection  Pny  is  distributed  as 

Pny  :  MO,roA*A„]  (68) 


An  =  diag(0  -O  1  O-O), 

This  makes  the  quadratic  form  y^ Pny  =  I'f’nylP  a  scaled  chi-squared  random  variable  with  the  following 
mean  and  variance: 

Ey'^P„y  =  roXl  (69) 


var[y^P;.y]  =  2(roA2)2. 

It  is  easy  to  see  that  the  sufficient  statistics  are  uncorrelated  and  therefore  independent: 

E(Pr,y)(Pmyf  =  PnRPm 

This  means  that  we  may  add  variances  at  will  to  compute  the  variance  of  tq. 

Unknown  {A^}f'.  When  the  eigenvalues  of  R  are  unknown,  then  the  ML  estimate  of  tq  is 

,v 


(70) 


(H) 
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The  mean  and  variance  of  this  estimator  are 


1  ^ 

Ei-o  =  =  '’0 

n=:l 

varro=  ^^2(roA2)2  =  ^ 


It  is  interesting  to  ask  which  distribution  of  the  unknown  minimizes  tlie  variance,  under  the  constraint 

.V 

that  A^  =  1.  The  obvious  answer  is  A^  =  1  for  all  n,  in  which  case 


var  To  = 


This  question  is  much  more  fascinating,  and  the  answer  more  profound,  when  the  estimator  is  replaced  by 
a  low-rank  approximant.  We  take  up  this  question  in  Section  3.5. 

Known  {A^}.  When  the  eigenvalues  of  R  are  known  up  to  the  scale  constant  tq,  then  the  ML 


estimate  of  tq  is 


The  mean  and  variance  of  the  estimator  are 


n=l 


r,  =  l 

varfo  =  ^^^2(roA2)2=  (78) 

This  latter  finding  is  very  important  because  it  establishes  ihe  lower  bound  on  the  variance  of  any  unbiased 
estimator  of  the  power  in  band  Why  do  we  say  this?  Because  the  idealized  experimental  setup 

illustrated  in  Figure  1  preserves  all  information  about  the  power  in  the  band,  and  the  snapshot  contains  all 
of  the  information  that  one  can  get  in  a  finite  snapshot.  The  assumption  that  the  eigenstructure  of  R  is 
known  up  to  a  multiplicative  constant  in  the  eigenvalues  represents  as  much  apriori  knowledge  as  one  can 
have  about  the  spectrum  Sxx(e^^)  without  giving  away  its  power  in  the  band.  Without  prior  knowledge  of 
the  normalized  eigenvalues  {An}i^,  the  minimizing  distribution  of  eigenvalues  produces  the  same  minimum 
variance.  Thus,  for  any  unbiased  estimator  of  the  power  in  the  band. 
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3.5  Reduced  Rank  Estimators 

T''  "  Cbtimator'^  \'  P  tia.v  far  der'vcH  ar.‘  fcrtivo  ir.  the  seiiS."  that  they  require  detailed  kiiowlou^c 
of  eigenstructure  for  their  implementation.  Typically,  such  knowledge  is  hard  to  come  by  because  eigenstruc- 
ture  for  subdominant  spectral  modes  is  notoriously  unreliable.  So,  what  if  we  replace  our  ML  estimators  by 
low-rank  appro.ximations  of  them? 

Unknown  The  obvious  low-rank  approximation  of  fo  is 

’"0  =  (SO) 

^  n=l 

The  mean  and  variance  of  the  low-rank  estimate  are 


^^0=1^  =  '■O  ^ 

n  =  l  n  =  l 

varfo  =  .^^2(roA2)2  =  ^ 

n=l  n  =  l 

The  mean-squared  error  of  this  reduced  rank  estimator  is 

MSE(rTj)  =  E(ro  -  ro)‘  =  (E  i-o  -  rof  +  var  fo 

N 


n  =  l  n  =  l  n  =  l 


n  =  l 
.2  .  'V 


n  =  l 

2ro^ 


(81) 


(82) 


(83) 


n=:m4-l 


nr:  1 


This  mean-squared  error  may  be  minimized  with  respect  to  the  {A^}  by  minimizing  the  Lagrangian 


n=m  +  l 

The  resulting  regression  equations  are 


/V  2  iTi  ,  S 

c  =  {  Yi  +2E(-'i)’-<'(7fi;-'t-i) 


(84) 


n=l 


n  =  l 


dC  o  u 
n<m:— =  4A^-^  =  0 


:  A^  = 


4.V 


N 


n  =  m+l 


V  X-  =  ^ 
”  4.V 


n=:m+  I 
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Invoke  the  constraint  to  solve  for  /r: 


—  A-  —  —  4-  —  =  1 

J\  +  _y  ^_y  “ 


(S5) 


n=l 


M  = 


4N^ 


N 


m  +  2  ’  4iV  m  +  2 

The  resulting  minimum  value  of  mean-squared  error  is 


o 

IL 

r  1 

2 

4-^m 

A’2 

km +  2^ 

+  iV2 

km-f  2/ 

2r2 


(86) 


m  -t-2 

This  result  establishes  the  lower  bound  on  the  mean-squared  error  of  any  low-rank  estimator  of  the  power 
in  a  band. 


Known  {A„}‘i  .  The  estimator 


'’0 


,1-^1  Tr. 


nsl  “ 


(87: 


is  the  obvious  low-rank  approximation  of  tq  when  the  are  known.  The  mean,  variance,  and  mean- 

squared  error  of  tq  are 


„  .  ^  tn 

Ero  =  ro0  — 

var  ro  =  ^  m 

MSE(m)  =  r5^1  -  -|-2r^/?^^m. 


The  minimizing  value  of  /?  is 


0  = 


N 


m-t-2  ’ 

and  the  corresponding  minimum  value  of  .MSE(m)  is 


1  - 


0m  2 
~  ~  m-l-2’ 


(88) 

(89) 

(90) 

(91) 


MSE(n.)  =  r|(^)%2r5(^) 


ym  +  2> 

2rg 


(92 


m  +  2 


This  result  matches  our  previous  result,  leading  to  the  conclusion  that  the  mean-squared  error  of  any  low-rank 
estimator  of  the  power  in  a  band  is  bounded  as  follows: 


MSE(/5„(e-,|)>;^(/5„(e*,^)’, 


(96) 


B(«) 


0(d) 
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3.6  Complex  Demodulation  and  Frequency  Sweeping 

III  seciiu.i  i.o  we  argued  liiai  ihe  power  in  ihe  lime  series  {r, ]  ihai,  is  concentrated  in  the  narro\\ 
spectral  band  B(c>)  is  twice  the  power  in  the  complex  demodulated  time  series  that  is  concentrated 

in  the  baseband  B.  This  property  follows  from  the  fact  that  the  covariance  sequence  for  the  complex 
demodulated  sequence  is  {e r,}  and  the  power  spectrum  is 

Let’s  assume  that  the  complex  demodulated  sequence  is  filtered  by  the  ideal  baiseband  filter  H(z). 
illustrated  in  Figure  3.  The  complex  snapshot  observed  at  the  output  of  the  filter  is  y  =  D(e“-l^)x,  and  its 
covariance  matrix  is  Ryy  - 

y  =  £>(e--’'^)x  (94) 


Ryy  =  Eyy'  =  J  ^ 

r  M 

We  have  used  the  identity  to  derive  these  two  equivalent  formulas  for  Ryy.  The 

covariance  matrix  Ryy  may  be  rewritten  as 


Die^'^)RyyDi€-^*)=  J  (96) 

The  covariance  matrix  on  the  right-hand  side  belongs  to  a  complex  snapshot  whose  spectrum  is  concentrated 
on  the  upper  sideband  B+{<^).  The  covariance  matrix  on  the  left-hand  side  belongs  to  the  modulated  snapshot 
D{e^'^)y.  The  two  snapshots  are  equivalent. 

The  power  in  the  complex  demodulated  snapshot  y  is  the  power  of  the  original  signal  that  is 
concentrated  in  the  upper  sideband  B+{4))  and  half  the  power  of  the  original  signal  that  is  concentrated  in 
the  narrow  spectral  band  5(d>): 


(97) 


it+ta)  B(«) 

If  the  power  spectral  density  is  slowly  varying  in  the  band  B{<*>),  then  the  translated  spectrum 

.S'j.j.(c-’*^"*'*')  is  slowly  varying  on  the  band  B.  This  means  that  the  power  in  the  narrow  band  B{o)  is 
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approximately 


ratd)=  /  — = 

J  ■^’T 

The  covariance  matrix  Ryy  is  approximately 

«vv  =  J  ^ 

B 

=  S„(e^^)/i=  ^ro(<;i)/2 
where  R  is  the  baseband  covariance  matrix  derived  in  Section  2.5; 

R  =  [  ^ 

J 

B 

The  trace  of  R  is  the  time-bandwidth  product  N0  and  i^tr/Jyy  is  half  roi't')'. 


-^tr[/?j,y]  =  iro(d). 


(99) 


(100) 


(101) 


Our  ML  results  may  be  applied  without  change  to  the  estimation  of  ro(<?)  in  the  model  v^y  :  .V  [o,  ro(o)^/?], 
with  ^U^R  =  1.  For  example,  when  the  eigenvectors  of  R  are  known  but  the  eigenvalues  are  assumed 
unknown,  the  reduced  rank  estimator  of  ro(0)  is 


W)  =  I  ll/’yll^  =  f  ||Ff9(e-^^)x||^  (102) 

m 

P  =  ^  ti„uj;  u„  ;  eigenvector  of  Ru  =  Au. 

n=  1 

This  estimator  uses  a  complex  demodulator  to  generate  the  snapshot  y  and  the  rank-one  projections  illus¬ 
trated  in  Figure  9. 


3.7  Reducing  Rank  and  Removing  the  Bandpass  Filter 

Our  most  refined  experiment  for  spectrum  analysis  is  depicted  in  Figure  10(a).  The  complex 
demodulated  signal  is  passed  through  the  ideal  baseband  filter  H{z),  the  resulting  snapshot  y  is 

projected  through  P  onto  the  subspace  spanned  by  the  m  dominant  eigenvectors  of  R,  and  the  power  in  that 
subspace  is  used  to  estimate  the  power  in  the  narrowband  D{0).  As  illustrated  in  Figure  5,  the  projector 
P  is  very  selective  in  frequency  when  the  rank  m  is  chosen  to  be  the  time  bandwidth  product  m  =  A'd, 
This  means,  for  all  practical  purposes,  that  the  ideal  baseband  filter  H{:)  may  he  removed  to  produce 
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the  pra''tical  diagram  of  Figure  10(b).  The  diagram  of  Figure  10(b)  may  be  redrawn  as  in  Figure  10(c). 
In  a  snapsliot  of  d-'a  '  =  (.roJi  •  ■  -r.v-i)  is  complex  deiiiodulaied  with  (he  demoduiation  iiu'Uri.\ 

D{e~'^)  to  produce  the  complex  demodulated  snapshot  y  that  is  then  projected  onto  a  subspace  where  the 
norm-squared  is  computed  and  scaled  to  estimate  power. 

3.8  Projection-Based  Spectrum  Analysis 
The  spectrum  estimator 

ro(d.)  =  23Srrie^^)  =  -|  | lPD(e-^^)x| |-  (103) 

is  nothing  more  nor  less  than  the  norm-squared  of  a  complex  demodulated  snapshot  that  hais  been  projected 
onto  a  narrowband  subspace.  A  geometrical  interpretation  is  presented  in  Figure  11.  The  complex  demodu¬ 
lation  matrix  L>(e  ■^®)  is  a  unitary,  or  rotation,  matrix  that  simply  rotates  the  measurement  vector  x  into  a 
position  in  complex  Euclidean  space  where  its  projection  onto  the  “baseband  subspace  IJ'm)"  rtuty  be  ■’^^ed 
to  estimate  the  power  in  band  When  this  procedure  is  carred  out  for  all  ii>,  the  pc  ’r  spectrum  is 

mapped  out.  The  result  of  equation  (100)  may  be  rewritten  to  make  it  look  like  Thomson's  [8]  multi-window 
spectrum  estimator.  Let’s  write  the  spectrum  estimator  as 

5xx(e^'^)  =  j|  I 

(10-1) 

n  =  l 

In  this  form,  the  complex  demodulated  vector  D(e~^'*‘)x  is  correlated  with  the  Slepian  sequences  and 

the  squared  magnitudes  are  summed.  This  is  illustrated  in  Figure  12(a).  The  inner  product  D{e~^'^)x 
may  also  be  written  as  follows: 

uj£)(e--’*)x  =  x^lF„'I'(e>'‘),  (105) 

where  \V^  is  the  diagonal  window  matrix  constructed  from  the  ri'^  Slepian  sequence  and  'I'(e-’'*)  is  the  Fourier 
vector; 

U-„  =diag(,/,.(l)H„(2)  ■  .;„(.%'))  (lOG) 

u„  =  (i/„(l)H„(2)  -  u„{.V))^. 


2.5 


Tlio  estimator 


1  1 

r2=rl 


:io7) 


is  a  multi-window  spectrum  estimator  wherein  tlie  sequence  x  is  windowed  with  Slepian  sequences,  and 
the  resulting  sequence  is  used  to  compute  the  windowed  periodogram  Then  m  such 

periodograms  are  averaged,  as  illustrated  in  Figure  12(b). 

One  final  interpretation — the  multi-window,  or  projection-based,  spectrum  estimator  is  essentially  a 
reduced  rank,  smoothed  periodogram.  To  see  this,  consider  the  smoothed  periodogram  originally  advocated 
by  Daniell  [.d]; 


i5(0)l 


/  yli:' 

t  =  0 


XtC 


-jei 


2l 
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;V-1 


'  2!r 


B 


TOS) 


B 

=  -^x'^D(e^^)RD{e-^^)x. 

A  P 

This  kind  of  quadratic  form  will  be  studied  extensively  in  Sections  4.0  and  5  0.  For  the  time  being,  we  simply 

m 

observe  that,  when  the  covariance  matrix  R  is  replaced  by  the  approximation  /?  2  ^  the  Daniell 

n  =  1 

smoothed  j^eriodogram  becomes  the  projection-based,  or  multi-window,  spectrum  estimator 


7  m 

5„(e^^)2— gKD(e-^^)x|^ 

1 

=  ;^l|eD(e-'*)*||“  (100) 

n=l 
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4.0  REPRESENTATION  THEOREM  FOR  QUADRATIC  ESTIMATORS 

The  covariance  sequence  {r,}  and  the  power  in  a  narrow  spectral  band  D[o)  are  tlie  two  most 
elementary  linear  functionals  of  a  power  spectrum  that  are  encountered  in  the  study  of  WSS  time  series. 
The  power  in  a  narrow  spectral  band,  namely 


'•0(0)=  J  Srrie^^)^. 

B(<P) 

characterizes  the  local  power  of  the  spectrum.  The  trigonometric  functionals 

r.  =  /  g 

—  ir 

characterize  the  entire  power  spectrum  via 


(110) 


(111) 


S„(eJ'')  =  (112) 

t 

Linear  Functionals  of  the  Power  Spectrum.  The  e.xperiment  illustrated  in  Figure  13  shows 
how  other  quadratic  forms  may  arise  naturally.  The  sequences  {ut}  and  {t^t}  are  the  outputs  of  two  filters, 
F(z}  and  G(:),  driven  by  the  common  input  sequence  {x,}.  The  zero-lag  '•ross-cos'ariance  between  the 
outputs  is 

£u,vr=  J  (113) 

—  * 

The  quantity  u,i*  is  quadratic  in  the  data  {x(},  and  the  e.xpectation  of  this  quadratic  form  is  a  linear 
functional  of  the  power  spectrum  Siz(e^^).  In  fact,  the  expected  value  of  every  quadratic  form  in  {x,}  is  a 
linear  functional  of  the  power  spectrum  5„(e-’*). 

If  we  take 

F(e^*)  =  G(e^*)=  //B(e'*)  (IM) 

(where  //gfe-’*)  is  the  ideal  bandpass  filter  of  Figure  1),  then  the  outputs  U(  and  vt  are 


U(  =  t'l  =  yt  ■ 


(115) 


The  .sequence  {yi}  is  that  part  of  {x,}  lying  in  the  band  B{o).  Fquation  (113)  is  then 

IT 

£■(!'/, I-)  =  J 


(nr,) 
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which  is  the  same  as  equation  (6). 
If  we  were  to  let 


G(c^‘')  =  1, 


then  y,  =  jti  and  the  right-hand  side  of  equation  (113)  becomes 


which  is  a  “general”  linear  lunciional  of  the  power  spectrum  5j-x(e'’*)'  Every  such  linear  functional  is  the 
expected  value  of  a  quantity  which  is  quadratic  in  the  signal  {ji}. 

Finite  Quadratic  Forms.  Every  quadratic  function  of  the  finite  data  vector  x  =  [xq  xj  •  xv_i]^ 


has  expectation 


where 


«• 

£[x*Qx]  =  J  F 


(e^")5„(e^^)— , 


Jt=0  m=0 

(This  is  a  trigonometric  polynomial.)  Every  such  quadratic  form  delivers  an  unbiased  estimate  of  a  linear 
functional  of  S(e^^).  The  family  of  all  such  quadratic  forms  provides  information  about  the  power  spectrum, 
but  not  enough  to  completely  characterize  it.  Since 


.V-l  JV-l 


f:[x’Qxi= 


Qkm  — m » 


we  see  that  only  those  values  of  r*  in  the  range  -N  <  k  <  N  are  involved.  Estimates  of  r,  outside  this 
range  necessarily  depend  on  some  “model”  or  functional  form  for  5(e-’*)  or  involve  extension  principles  like 
maximum  entropy  (wiiich  forces  a  model  indirectly). 

If  one  does  not  want  to  assume  a  model,  then  it  becomes  necessary  to  admit  that  we  can  know 
something  about  but  that  we  cannot  know  it  completely.  Each  linear  functional  of  Sxj-{e^^)  forces 

it  into  a  hyperplane  in  function  space.  Knowing  the  values  of  n  linear  functionals  forces  the 

intersection  of  n  hyperplanes.  This  is  still  an  infinite  dimensional  space. 


4.1  Estimator  Properties 


28 


Aj.suiniiig  tliat  \vt'  have  nir'-od  tv)  consider  oiil\  (iiiadratic  estimators,  we  must  still  decide  which 
quadratic  form  to  use  and  we  must  decide  liow  compellini;  tlie  results  of  Sections  2.0  and  3.0  are.  .\s  a  tool 
for  eliminating  undesirable  choices,  let  us  list  some  properties  which  power  spectrum  estimators  of  the  form 

Sie-'^x)  =  x'QlO)x  (121) 


should  enjoy. 

Positivity.  The  power  spectrum  should  he  nonnegative  real. 


S{e^^ .x)  >  0 


(122) 


This  condition  will  hold  if  the  matri.x  Q((>)  is  positive  semidefinite.  We  will  use  the  notation 


Q(0)  >  0 


(123) 


for  this. 

The  rest  of  the  properties  we  shall  list  all  involve  linear  transformations  on  the  data  vector  x. 
Amplitude  Scaling.  If  the  signal  x  is  multiplied  by  the  scalar  p,  then  the  power  spectrum  should 
be  multiplied  by  Ipj*.  Therefore. 

Sfe-’^’./zx)  =  |p|-5(c-'^.  x).  (12-J) 

This  property  holds  for  any  quadratic  estimator. 

Modulation  Invariance.  .Suppose  we  construct  a  signal 

yt  =  f"^x,.  (12.3) 


This  is  called  modulation  and  shifts  the  signal  in  frequency  by  the  angle  0.  To  see  this,  consider  the 
autocorrelation  sequence 

(120) 

lakiiig  Fourier  transforms  yields  the  fre.quency  <lornaiM  equivaleni. 

(127) 
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For  finite  data,  the  linear  transformation  in  equation  (125)  lias  matrix  representation 


where 


y  =  D(e-’'^)x, 


=  diag[l.e-'^,e-''-^,...,e>(^-^>^]. 


(12i 


(1291 


The  estimator  property  consistent  with  equation  (127)  is 


5(eJ^D(eJ«’)x)  =  5(e-'(^-'*’,  x). 


(130) 


We  shall  call  an  estimator  with  this  property  modulation  invariant. 

If  5o(e'’*,x)  is  an  estimator  which  does  not  have  this  property,  then  we  can  construct  a  one- 
parameter  family  of  estimators 

5^(e^^x)  =  5o(€^‘'’+^>.T>(eJ^)x).  (131) 

There  is  no  a  pnon  reason  to  prefer  one  member  of  the  family  over  another.  The  average 

W 

S(e^'«,x)  =  y  5^(e>»,x)^  (132) 

—  ir 

would  be  modulation  invariant.  Modulation  invariance  is  not  unique  to  quadratic  estimators.  Most  common 
autoregressive  estimators  have  the  property,  even  when  they  are  constructed  from  the  normal  equations  of 
linear  prediction. 

7-Symmetry.  Suppose  we  construct  the  time-reversal  of  the  signal  {r»}: 


Then 


and 


=  x-t- 


Eyt+kVt  =  »■-*  = 


S,,(e^«)  =  5„(e->»). 


(133) 


[134) 


(135) 


This  produces  something  different  only  if  the  signal  t  is  not  real-valued.  As  estimator  version  of  this  is 


5(e'^7x)  =  S(e-’\x) 


[13(1) 
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(137) 


J  = 


W'e  will  call  such  as  estimator  J -symmetric.  The  matrix 

<9,  1' 

reverses  the  components  of  any  vector  to  which  it  is  applied. 

Band  Limiting.  Let  t/(  be  the  signal  generated  in  the  ideal  experiment  of  Figure  1.  This  is  that 
part  of  the  signal  {x()  which  lies  in  the  band  B{^).  Then 


_  /  5„(e>«),  e  6  Bid) 

.0,  e^Bid). 


:i3S) 


It  is  (unfortunately)  impossible  to  generate  any  value  of  yi  from  a  finite  record  of  the  signal  {x*}.  ^Ve  would 
like  to  write 

y  =  PB^<  (139) 


where  Pb  is  some  matrix  which  projects  the  data  {x()  onto  the  “subspace  of  signals  lying  in  the  band  Bio)." 
Equation  (139)  is  exact  only  in  the  infinite  dimensional  case.  If  it  held  in  the  finite  dimensional  case,  we 
could  demand  a  property  of  the  form 


Sie^\PBx)  =  HBie^')Sie^\x). 


(137) 


Although  this  property  is  unattainable  with  finite  data,  our  estimator  should  approximate  it.  To  the  extent 
that  we  can  construct  some  family  of  “bandpass  filtering”  projection  matrices  for  which  this  holds  down  to 
a  certain  bandwidth,  then  we  can  say  that  we  have  “resolution”  to  that  bandwidth.  Without  it,  we  can 
construct  estimators  which  enjoy  all  the  properties  previously  mentioned  but  which  are  trivial.  An  example 
of  such  as  estimator  would  be 

5(e>*,x)  =  ||x||^ 

We  shall  not  demand  as  yet  that  our  estimator  enjoy  all  the  properties  mentioned  in  the  previous 
section.  To  begin,  we  require  a  quadratic  estimator  which  is  positive  and  modulation  invariant.  With  this 
minimum  requirement,  we  prove  the  following  representation  theorem. 

Representation  Theorem:  Every  non-negative,  quadratic,  modulation  invariant  power  spectrum 


estimator  has  the  form 


where  is  an  m  x  .V  complex  matrix  having  rank  rn  and  D{e^^)  is  defined  in  equation  (129). 


Proof:  If  the  estimate  is  quadratic,  then 

S(eJ^x)  =  x*Q(e)x, 


and  if  it  is  also  nonnegative  then 


Q(d)  >  0 


for  each  6.  In  particular,  (?(0)  >  0  and  can  therefore  be  factored  as 


Q(0)  =  V'T, 


(142) 


(143) 


where  V'  is  m  x  .V  and 


m  =  rank  <5(0). 


If  the  estimator  is  modulation  invariant,  then 

[Die^^)x]'Q(9)[Die^^)x]  =  x'Q{0  -  <»x 


for  all  X,  $,  d>.  This  implies  that 

Q{e  -<*>)  =  D'(e^^)Q{9)D(e^^) 

for  all  9,  4>.  Take  ^  =  0  and  reverse  the  sign  of  <p  to  get 

Q(^)  =  D(e-^^)Q{Q)D[t-’*) 

(144) 

=  D{ei*)V*V  D{e-^^). 

Combining  equations  (144)  and  (142)  yields 

5(e^*,x)  =  x*D{e^^)V*VD{e-^^)x 

(145) 

=  ||V^?(e-^'’)x||^  ■ 

4.2  Multiple  Window  Interpretation 

The  estimator  of  equation  (Ml)  can  be  understood  as  tlie  composition  of  three  operations.  Multi¬ 
plying  the  data  by  D(e~^^)  is  demodulation,  or  shifting  in  frequency,  so  that  what  was  at  angle  9  is  now  ,at 
zero.  Operating  on  the  result  by  V  is  akin  to  passing  the  demodulated  data  through  a  bank  of  rn  lowpass 
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filters.  Tlie  average  energy  of  the  resulting  outputs  is  estimated  by  taking  the  norm  squared.  Tiiis  yields  an 
estimate  of  the  power  in  the  vicinity  o{6. 

We  can  put  the  computation  of  S(e^^,x)  in  another  way  which  allows  for  the  use  of  an  FFT 
operation  to  provide  samples.  Write 


m  iV  —  1  ^ 

5(e>^x)  =  ||V'Z}(e-^^)x(|-=  j". 

>1=1  1=0 

In  this  form,  the  estimator  is  seen  to  be  a  generalization  of  the  classical  periodogram,  which  is 


(M6) 


,v-i 


5i(e 


Xke 


-ike 


k-!' 


(H7) 


In  this  case,  m  =  1  and 


where 


(MS) 


(M9) 


If  the  data  is  first  "windowed” 
periodogram  with  window 


by  multiplying  pointwise  with  a  window  sequence,  then  we  will  have  the 


Su;(e-’^,x)  =  ■^1  XI 

t=o 


(150) 


The  general  quadratic,  positive,  modulation  invariant  estimator  in  equation  (146)  is  simply  an  average  of  m 
windowed  periodograms.  For  this  reason,  it  is  called  a  multiple  window  periodogram.  Since  it  amounts  to 
the  sum  of  m  periodograms,  one  can  efficiently  obtain  samples  of  5(e^^,x)  with  spacing  2vlN  by  employing 
m  FFT  operations. 


4.3  J-Symmetry 

We  shall  call  any  estimator  of  the  form  in  equation  (141)  a  natural  estimator.  It  is  completely 
characterized  by  the  mx  A’  matri.x  V .  This  matrix  is,  liowever,  not  uniquely  determined  by  the  estimator  since 
multiplication  on  the  left  by  a  unitary  matrix  U  will  produce  V'  =  UV  and  leave  the  estimate  unchanged. 
If  r  is  »n  X  m  and 

irr  =  1.  (l.jl) 
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then 


|lV’D(e-J^)xl|-  =  \\UVD(e-^^)x\\\  (152) 

In  general,  llie  matrices  V  and  I"  generate  the  same  natural  estimator  if  and  only  if  there  exists  a  unitary 
matrix  U  for  which 

V'  =  UV.  (153) 

This  observation  is  relevant  to  the  following  question. 

Under  what  conditions  is  a  natural  estimator  7-symmctric — i.e.  it  satisfies  the  identity  (136)'’  If 
we  assume  that  both  equations  (136)  and  (Ml)  hold,  then 

||UD(e--"’)Jx||'  =  ||V'D(e^*)x||‘  for  all  0.x.  (154) 

We  make  the  observation  that  the  matrices  defined  in  equations  (129)  and  (137)  satisfy 

D(e-^^)J  =  (155) 

Thus  the  estimator  is  J  symmetric  if  and  only  if 

||U7D(e-'^)x|f  =  (|UD(e>^)x|f  forall0,x,  (156) 

In  other  words,  V  =  V J  produces  the  same  estimator  as  does  V .  It  must  therefore  have  the  form  U'  =  VV 
for  some  unitary  matrix  U . 

In  summary,  a  natural  estimator  is  also  J-symmetric  if  and  only  if 


VJ  =  UV 


for  some  unitary  matrix  U . 


(157) 


4.4  The  Mean  of  the  Estimator 

The  T-symmetry  and  modulation  invariance  properties  provide  gross  information  about  the  form 
( 1  11)  of  natural  estimators  and  the  matrix  V'  (equation  (157)).  We  do  not  as  yet  have  a  tool  for  understanding 
the  resolution  or  frequency  discrimination  properties  of  the  estimator.  Intuitively,  we  know  that  the  rows  of 
can  be  thought  of  as  FIR  lowpass  filter  responses  and  as  window  functions.  The  resolution  would  then 


31 


be  somehow  related  to  the  bandwidth  of  tlie  frequency  response  functions  of  these  m  filters.  But  wliy  use 

in  competing  lowpass  filters  instead  of  one  “good”  one?  In  order  to  come  to  an  understanding  of  these 

phenomena,  we  are  led  to  evaluate  the  mean  and  variance  of  natural  estimators.  The  mean  will  depend  on 

S(e^^)  and  the  matrix  V .  The  same  will  be  true  of  the  variance  provided  we  make  the  assumption  that  the 

signal  X  is  Gaussian  so  that  we  can  characterize  fourth  moments  in  terms  of  S(e^^). 

Since  the  estimator  S(e-'®,x)  is  quadratic  in  the  data  vector  x,  its  expectation  will  be  a  linear 

functional  of  5(e-’®).  We  shall  construct  two  expressions  for  this. 

Let  Rri  be  the  covariance  of  the  data  vector  x: 

To  r_i  r_,v+i  ■ 

/?„  =  £[xx*]=  (1.5S) 

:  r_i 

ri  I'D  . 

Using  the  notation  of  Section  3.0,  we  may  use  the  Fourier  representation  (43)  to  obtain  a  representation  for 
the  matrix 

T 

(1.59) 

-  If 

Now  let  us  compute  the  mean  of  ,x): 

£:[5(e^^x)]  =  E[x*Die^^)V'VDie-^^)x] 

=  Tt[D{e^^)V*VD{e-^^)Rrr]  (160) 

=  TT[VD{e-^^)R:,^D{e’^)V*]. 

This  is  the  first  expression  for  the  mean,  and  it  uses  the  covariance  matrix  This  can  be  reduced  to  a 
linear  functional  of  5(e^*)  by  using  the  representation  (159)  for  /?„.  The  result  is 

£;[5(e^^x)]  =  J  5(eJ<')||v/iIi(e>(®-«))|l"^.  (161) 

—  W 

This  bears  comment.  Let 

lU(x)  =  U'Kz)=  (162) 

JVmiz). 

Then  UV(e-'^)  is  the  frequency  response  of  the  row  of  V ,  which  we  think  of  as  a  lowpass  filter.  Tlie  mean 
of  the  estimate  is  the  convolution  of  tlie  true  spectrum  with  the  positive  real  function 

l|lV'(f^'')ir  =  f;|U^,.(e^'’)|‘.  (163) 

,1=1 
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This  is  the  sum  of  tiie  magnitude-squared  responses  of  the  m  lowpass  fiiters.  If  these  fiiters  liave  smaii 
hnndwidth.  then  Ef5(e'’^.x)]  will  be  a  weighted  average  of  the  values  of  the  true  spectrum  on  an  interval 
of  tile  same  width  centered  at  6.  This  is  the  means  by  which  we  can  judge  the  resolution  of  the  estimator. 


4.5  Variance  eind  Mean-Squared  Error  of  the  Estimator 

We  must  assume  that  the  data  vector  x  is  real  valued  and  Gaussian  with  mean  vector  0  and 
covariance  matrix  Rxr-  If 


Q  =  A  +  jD 


(164) 


is  Hermitian,  then  the  random  variable 

y  =  x^Qx  =  x^.4x 


(165) 


has  mean  and  variance 

£[y]  =  Tr[Q/Z„]  =  Tr[T/?„]  (166) 

var(y)  =  2Tr[(.4/?,,)*].  (167) 

Our  estimator  has  this  form,  with 

Q{e)  =  D(e^^)V*VD(e-^^)  (168) 

and 

.4(^)  =  ^[Q(^)  +  Q(-^)].  (169) 

Therefore. 

var5fV*,x)=  iTr[(Q(^)-hQ(-0))/i,,]".  (170) 

If  we  expand  the  square  in  this  expression,  we  will  arrive  at  four  terms.  It  turns  out  that  two  of  these  have 
the  same  trace,  and  the  remaining  two  have  the  same  trace.  Define 


A[(0)  =  VD(e-^^)RxrDie^^)V*  (171) 

N(0)  =  VD{e^^)RxrD(e-^^)V*.  (172) 

(.'.'ling  tlie  trace  identity 

Tr(  FG)  =  Tr(  CD  (17;i) 
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liberally,  we  see  that 


IVlQCe) /?„(?(())/?„]  =  Tr[M-{e)]  (171) 

Tr[C)(0)/?„g(-0)/?„]  =  Tr[.V(-0)A^(^)] 

Tr[Q(-D);?,,Q((?)/?„]  =  Tr[iV(0).V(-0)] 

=  TT[Ni-e)N(0)] 

Tt[Q{-e)R^,Q(-9)R.,]  =  Tr[A/2(_^)]. 

The  middle  two  terms  have  the  same  trace.  In  addition,  M^(d)  and  M‘{—6)  differ  only  in  imaginary  part. 
Since  they  are  both  Hermitian,  they  have  real  trace,  and  therefore  they  have  the  same  trace.  Combining  all 
this  yields  the  variance  expression 

va^5ieJ^x)  =  lV[A/2(^)]  +  Tr[Ar(0)iV(-0)] 

=  TV[.U(^)Ar(0)]  +  Tr[A^(^)Ar*(0)] 

p=i k=i 

We  can  put  this  in  terms  of  S{e^^)  by  using  the  representation  (159)  for  the  matrix  in  the  definitions 
for  M(6)  and  A'(^l).  This  yields 

ir 

A/,k(0)  =  J  5„(e^'’)W^(e-’(<'-^))W:(e>(»-^))  ^  (176) 

—  T 

W 

N^k(O)  =  J  5„(<^)lV^(e-^«*+*>)in*(e>(«-^))  (177) 

—  IT 

Of  the  two  variance  terms,  the  one  involving  M{0)  will  likely  be  much  greater  than  the  one  involving 
A'(f^)  because  of  the  misalignment  of  the  window  response  functions  in  equation  (177).  From  equations  (160) 
and  (175),  we  can  write 

£[5('>^^x)]  =Tr[.\/«?)]  (178) 

var[5(e^*.x)]  >  Tr[A/'(f?)] .  (170) 
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The  inequality  is  off  by  tlie  term  involving  N{0).  These  can  be  used  to  construct  a  lower  bound  on  mean- 
squared  error: 

MSE(<?)  =  £[5„(e-"')  -  5(e>^x)]‘ 

=  -  E5(e^'«,x)]"  +  var[5(e>^x)]  (180) 

>  -  TrA/(0)]*  +  TrM-{9). 

Let  the  eigenvalues  of  A/(0)  be  {/ii, . . . , Then 

M  o  M 

MSE((?)>  (181) 

ib=l  k  =  l 

The  noinimum  of  this  expression,  cis  a  function  of  the  eigenvalues,  occurs  when 

ii*  =  -  for  each  k.  (182) 

m  -I-  1 

Placing  the  minimum  in  the  right-hand  side  of  (171)  produces  the  inequality 

S-  (e^^) 

MSE(0)  >  ,  T  (183) 

m  -f  1 

Since  .\f{9)  is  Hermitian.  it  can  have  identical  eigenvalues  only  if  it  is  a  scalar  multiple  of  the  identity — i.e. 

M{e)  =  vD*(e}RrrD(e)v  =  (i84) 

m  -f  1 

The  lower  bound  (183)  provides  a  reason  for  using  more  than  one  window,  (m  is  the  number  of 


windows  and  the  number  of  rows  in  V.) 


5.0  SPECIAL  QUADRATIC  ESTIMATORS 


Tlie  spectral  estimator 


(185) 


of  equation  (141)  composes  three  operations;  demodulation,  lowpass  filtering,  and  energy  measurement.  It 
is  interesting  to  compare  this  chain  of  operations  with  a  superheterodyne  radio  receiver.  The  heart  of  such  a 
receiver  is  a  high  quality  bandpass  filter  (the  IF  amplifier)  with  a  fixed  center  frequency  and  bandwidth.  The 
rest  of  the  receiver  merely  accomodates  this  filter.  The  incoming  broadband  signal  is  frequency  shifted  so 
that  the  desired  band  is  moved  into  the  IF  band.  On  the  other  side,  the  output  of  the  IF  filter  is  “detected." 
The  sensitivity  of  the  receiver  depends  mostly  on  the  quality  of  the  IF  filter.  For  our  quadratic  spectrum 
estimator,  the  quality  (variance  and  resolution)  depends  entirely  on  the  choice  of  lowpass  filters  represented 
by  the  rows  of  the  matrix  V'. 

In  this  section,  we  will  present  some  likely  choices  of  K  in  a  somewhat  developmental  order  beginning 
with  rank-one  estimators.  Since  several  classical  estimators  have  the  form  of  equation  (141),  we  will  display 
them  when  appropriate. 


5.1  Rank-One  Estimators 

For  a  rank-one  estimator,  there  is  but  one  lowpass  filter,  and  the  matrix  V  is  a  row  vector.  The 
estimator  has  the  form  of  equation  (150).  That  is. 


/V-l 


XkWte 


-jk9 


(186) 


k=0 


and 


(187) 


There  is  a  choice  of  interpretations 

As  before,  we  can  think  of  demodulation,  followed  by  lowpass  filtering.  In  this  interpretation,  the 
matrix  V  represents  a  lowpass  filter  pulse  response  .sequence,  and  we  are  using  only  one  value  of  the  lowpassed 
output  sequence  to  estimate  power. 

The  more  common  point  of  view  is  that  we  first  “window"  that  data  Xk  by  multiplying  by  tcj.  then 
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demodulate  and  perform  a  time  average.  The  shape  of  tiie  window  function 


5.2  LoW'Rank  Estimators  and  the  Time  Bandwidth  Product 

The  virtue  of  using  more  than  one  lowpass  filter — i.e.  of  using  a  rank  m  estimator  with  m  >  1 — lies 
in  the  potential  of  decreased  estimator  variance.  Up  to  a  point,  the  mean-squared  error  should  be  inversely 
proportional  to  m.  For  given  resolution  requirements,  however,  there  is  a  limiting  value  of  m  beyond  which 
no  improvement  can  be  expected.  This  limit  is  a  “time  bandwidth  product.”  In  this  section,  we  will  put 
forth  an  argument  for  this  assertion. 

The  choice  of  normalized  bandwidth  0,  with 


0  <  /?  <  1, 


is  somewhat  arbitrary.  It  is  a  rough  measure  of  the  resolution  in  frequency  we  expect  from  our  spectrum 
estimator.  If  we  assume  that  the  power  spectrum  is  essentially  constant  on  intervals  of  length  2^0,  then 
an  estimate  of  the  power  in  the  band  \0  —  0o\  <  0Tf  serves  as  an  estimate  of  /?5rr(e'’*°).  In  other  words, 
we  get  an  estimate  of  by  estimating  the  power  in  a  band  about  do-  This  estimate  will  improve  as 

we  increase  0.  ljut  only  at  the  expense  of  resolution.  This  tradeoff  is  inescapable,  and  the  sum  of  ill  effects 
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diniinishes  only  with  increasing  data  lengtli  A".  For  modulation-invariant  estimators,  the  probleTi  can  be 
reduced  to  the  estimate  of  power  in  the  low-frequency  band 


|0|<^tr.  (191) 

(with  =  0).  This  is  the  passband  of  the  bank  of  low  pass  filters  described  by  the  rows  of  the  matrix  W 
For  convenience,  call  this  band  baseband. 

For  the  purpose  of  estimating  baseband  power 


we  would  like  to  isolate  the  baseband  component  of  the  signal.  But  of  course,  we  have  only  a  finite  time 
window’s  worth  of  data.  We  are  caught  in  a  classic  dilemma — trying  to  isolate  a  signal  (or  a  signal  component ) 
in  both  time  and  frequency.  The  more  concentrated  a  signal  becomes  in  one  domain,  the  more  dispersed  it 
must  be  in  the  other.  Although  we  cannot  do  this  simultaneous  isolation  exactly,  we  must  do  it  approximate!}' 
if  we  are  to  do  spectrum  estimation. 

Both  operations — isolation  in  time  (windowing)  and  isolation  in  frequency  (lowpass  filtering) — arc 
projection  operations.  For  the  moment,  let  x  be  the  entire  WSS  signal.  Let  Pp  be  the  projection  which 
zeros  out  all  values  of  the  signal  except  those  in  the  data  window: 


iPTx)k  = 


Xfc.  0<k<N-l 
0,  otherwise. 


Let  Pfi  be  the  ideal  baseband  filter; 


{Pnx)k  =  ^  hixt-i, 


where 


I  0,  otherwise. 
i  =  -oo  ' 


Each  of  thc-c  Ct^^iators  is  an  orthogonal  projection; 

P=P^-  =  P*.  (196) 

But  they  do  not  commute.  (The  ba.scband  component  of  the  time  windowed  signal  is  not  the  time  restricted 
l^iseband  component.)  If  they  did  commute,  then  the  product 


P=  PrPn 


A\ 


would  also  he  a  projection.  This  projection  would  simultaneously  isolate  a  signal  component  in  both  time 
and  frequency. 

In  certain  respects,  however,  the  products  are  approximately  equal; 

PTPn^PnPr,  (19S) 


and  the  product  is  close  to  a  projection  having  rank  N0,  the  time-bandwidth  product.  This  assertion  can  be 
supported  in  various  ways,  each  of  which  corresponds  to  known  multiple  window  spectrum  estimators.  We 
will  sketch  some  of  these  in  the  following  sections.  For  the  moment,  let  us  consider  the  consequences — on 
estimator  variance — of  the  assumptions  that  equations  (197)  and  (198)  hold. 

Suppose  that 


P  =  PrPfi  =  PaPr, 


(199) 


is  a  rank  m  =  A'/?  projection,  and  assume  that  the  WSS  signal  x  is  Gaussian.  Then  Px  is  Gaussian  and  is 
concentrated  on  a  subspace  of  dimension  m,  even  though  there  may  be  A'  values  of  the  vector  which  differ 
from  zero.  At  most  m  scalar  functions  of  Px  can  be  statistically  independent.  In  particular,  at  most  rn 
independent  estimates  of  baseband  power  can  be  obtained,  and  therefore  we  can  expect  an  improvement  of 
1/m  in  estimator  variance  when  these  are  averaged.  But  we  cannot  expect  any  more  than  this. 

If  we  actually  had  our  hypothetical  projection  P.  how  should  we  estimate  If  P  =  PjPrt, 

then  Px  =  Pry,  where  y  is  the  WSS  baseband  component  of  the  WSS  signal  x.  Therefore, 

0T 

£[!/*]=  j  «  /?5„(e>°)  (200) 

~0W 

(assuming  that  the  spectrum  is  fairly  constant  on  intervals  of  width  2/?Jr).  Now,  using  the  definition  of  Pt, 


k=0 


(201) 


If  we  use  this  as  an  estimate  of  the  variance  of  y*  and  combine  it  with  equation  (200),  we  get  an  estimate 
for  Srrie^°)- 

S{e^^)  =  ^\\Pry\\-  =  j^\\Px\\-.  (202) 


This  estimate  is  obviously  quadratic  in  the  data  and  has  rank  m  =  .\'0.  the  time  bandwidth  product 
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W'e  have  been  using  the  notation  x  for  tlie  entire  signal 


X  =  {j-fc  :  -oc.  <  A-  <  3c}. 

The  data  vector  which  appears  in  the  estimator  (185)  is  there'bre 

■Co 


L  x  .v_i  J 


=  F^x. 


\v  here 


= 


(203) 


(ion 


is  an  .V  X  oc  matrix.  The  connection  between  the  estimates  (185)  and  (202)  would  be 

F(e^°,F^x)  =  ||l'F^x||-  =  ^||Fx||^  (OO.V) 

Thi.'  would  determine  1'  up  to  multiplication  on  the  left  by  an  orthogonal  matrix,  via 

fyTyjxT^^p 

and  ther^-fore  (since  F^F  =  I) 

V'^V  =  —F'^PF.  (206) 

A  J 

This  formula  of  course  involves  a  projection  P  which  does  not  exist.  It  is  useful  only  for  guessing  what 
should  actually  be  done.  Let  us  now  examine  some  approaches. 


5.3  Time  Division  Multiple  Windows 

In  [tj],  Welch  proposed  a  quadratic  multiple  window  estimator  in  which  the  matrix  V  would  have 

the  form 


V'  = 


(207) 


Th'=  first  row  contains  a  symmetric  window  of  length  much  le.ss  than  A',  followed  by  all  zeros.  The  remaining 
rows  are  shifts  of  the  first  row.  by  equal  amounts,  so  that  the  last  row  is  right  lustified.  This  matrix  satisfies 
the  ./-‘^ytnmet  ry  condition 

\j  =  rv. 

•13 


witli  r  simply  an  rn  x  rn  version  of  J .  Let  us  give  an  interpretation  of  this  estimator. 


As  before,  let  y  =  Pfix  be  the  baseband  component  of  the  WSS  signal  x.  Since  y  is  bandlimitcd.  it 
is  completely  determined  by  the  subsequence  of  samples  spaced  l/l3  apart  (assuming  that  this  is  an  integer). 
Loosely  speaking,  then,  from  every  large  set  of  X  consecutive  values  of  y,  only  N >3  are  useful.  Moreover, 
if  5(t-'^)  IS  constant  on  baseband,  then  the  subsamples  of  y  (at  spacing  1/d)  are  indeed  uncorrelated. 
Furthermore,  tiie  covariance  matrix  of 


yo 


t/.v-i 


=  F'^y 


(■->081 


should  have  m  =  XJ  essentially  equal  dominant  eigenvalues,  with  the  other  A>  —  m  =  (1  —  3)X  very  close 
to  zero.  This  suggests  that  the  baseband  power  estimate 


1  * 
k  =  0 

would  have  about  the  same  variance  as  an  estimate  which  used  all  .V  values  of  y  but  one-tn'^'  the  variance 
of  an  estimate  which  used  only  one  value  of  y. 

The  Welch  estimator  approximates  the  scenario  we  have  just  described.  The  first  row  of  \’  contains 
a  properly  normalized  lowpass  filter  unit  pulse  response.  Because  of  the  successive  shifts,  the  vector 


represents  m  samples  of  the  output  of  an  FIR  filter,  equally  spaced  with  spacing 

1_  _  ^ 

/?  m  ’ 

so  that 

.  m—  1 

Jl|rF'^x||‘  =  (210) 

i=0 

Here  y,  is  the  FIR  lowpass  filter  output,  approximating  the  actual  baseband  signal  values  used  in  the 

estimate  CdOO). 

In  this  formulatioti.  one  lowp.ass  filler  is  used  whose  passband  is  the  entire  baseband.  One  then 
samf)les  the  output  signal  at  the  .Nyqiiist  rate  to  get  (in  the  (Jaussian  rase)  .something  close  to  st at ist icall> 


independent  and  identically  distributed  random  variables.  The  number  of  these  is  appro.Kimately  X3.  the 
time  bandwidth  product. 

5.4  Frequency  Division  Multiple  Windows 

The  estimator  of  the  previous  section  uses  N  3  output  samples  of  a  single  lowpziss  filter  of  bandwidth 
3.  The  other  extreme  is  to  use  an  output  sample  from  each  of  N 3  narrowband  filters,  each  having  bandwidth 
l/.^  ,  This  is  the  smallest  bandwidth  for  which  the  Nyquist  subsampling  rate  for  the  output  signal  would 
deliver  at  least  one  sample  in  a  block  of  .V  consecutive  values.  Since  we  must  limit  ourselves  to  FIR  filters  of 
length  .V.  we  cannot  achieve  a  perfect  decomposition  into  narrow  bands  in  this  way.  We  shall  see,  however, 
that  X 3  is  an  upper  bound  on  the  number  of  length  X  FIR  bandpass  filters  for  which 

(i)  the  pulse  response  sequences  are  orthogonal,  and 

(ii)  the  individual  passbands  are  inside  baseband. 

For  white  Gaussian  data,  the  first  condition  would  make  the  output  samples  of  any  two  bandpass  filters 
uncorrelated  and,  therefore,  statistically  independent. 

The  simplest  example  of  this  approach  is  derived  from  the  length  X  discrete  Fourier  transforms. 
We  shall  begin  with  this  case  and  then  attempt  to  generalize.  To  facilitate  our  discussion,  suppose  that 

X  3  —  tn 

is  a  positive  odd  integer.  Construct  V  by  letting  its  m  rows  consist  (in  any  order)  of  normalized  rows  of  the 
DFT  matrix: 

=  (211) 

where  1  (see  equation  (ld9))  is  the  vector  whose  elements  are  ail  one.  The  normalization  is  chosen  so  that 
the  estimator  is  unbiased  in  the  while  noise  ca.se  or,  equivalently, 

Tr[RR*]=l.  (212) 


The  row  V^'>  is  the  unit  pulse  response  of  an  FIR  bandpass  filter  with  passband 


having  normalized  bandwidth  1/A'.  The  rn  rows  of  V'  are  orthogonal  and,  in  fact, 


Vi'*  = 


(214) 


Therefore  this  example  meets  the  two  conditions  we  have  specified,  with  a  liberal  interpretation  of  filter 
passband. 

The  spectrum  estimator  constructed  with  this  choice  of  K  is  intimately  related  to  the  periodogram 
(to  which  it  reduces  when  m  =  1).  Letting  Si  be  the  periodogram  of  equation  (147),  our  estimator  becomes 


5(eJ*,x)=—  ^  (215) 

h  K?- 

To  obtain  a  sample  of  5,  one  takes  the  average  of  m  samples  of  Si- 

Toeplitz  Quadratic  Estimators.  Generalizing  the  notion  of  averaging  the  periodogram  leads  to 
Toeplitz  estimators.  Let 

CC 

//(eJ')=  (216) 

is  — CO 

be  nonnegative  real  and  concentrated  in  the  vicinity  of  0  =  0.  Let 


5l(e^^x)  =  l|l^D(e-^'’)x|^ 


(217) 


be  the  periodogram,  as  in  equation  (144).  Consider  the  averaged  periodogram 

ir 

5(e^^x)=  i  j  //(e>^)5,(e>('-^l,x) 

—  ir 

This  estimator  is  quadratic  and  has  the  form 


5(e^*,x)  =  ||l^D(e->»)x||^ 


where 


Q  =  V*V 

W 

"In  j  D(e^ 


dd 


(218) 


(219) 
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Tills  matrix  is  nonnegative  definite  and  Toeplit:: 

A 

Oh  =  f  (220) 

—  ir 

If  is  composed  of  m  delta  functions  (and  is  therefore  concentrated  on  m  points,  as  it  would  be  for 

the  estimator  of  equation  (215)),  then  Q  has  rank  m.  If,  on  the  other  hand,  H  is  positive  on  an  interval, 
then  Q  will  be  positive  definite  and  have  rank  N.  Thus,  averaging  the  periodograrn  increases  the  estimator 
rank.  (Recall  that  averaging  the  data,  as  in  Section  5.1,  does  not  increase  rank.)  All  nonnegative  definite 
Toeplitz  matrices  have  the  representation  (219)  or  (220). 

The  spectrogram  of  Granander  [19]  is  Toeplitz  in  our  present  sense.  Consider  once  more  the 
problems  of  estimating  baseband  power  and  then  using  this  to  estimate  the  power  spectrum  at  0  =  0  as  in 
equation  (202).  The  operator  P  in  that  equation  is  the  approximate  projection 

P  =  PtPci, 

where  Pn  is  the  Toeplitz  (lowpass  filter)  projection  of  equation  (194)  and  Pr  is  the  projection  of  rank  N 
which  zeroes  out  all  sequence  elements  except  for  those  in  the  range  0  to  A'  -  1.  This  can  be  written 

Pt  ~  FF^, 


where  F  is  given  by  (204).  It  follows  that 


F^Pt  =  F^, 


and  therefore  the  quadratic  form  in  equation  (206)  is 


Q  =  V'^V  =  ^F'^PrPnF 


(221) 


This  matrix  is  a  scaled  .V  x  N  diagonal  block  of  the  Toeplitz  prection  operator  Pn-  It  is  also  identical  to 
the  matri.x  of  equation  (219),  provided  Il(€^^)  is  the  ideal  baseband  filter.  And,  finally,  coming  full  circle,  if 
R  is  the  matrix  of  equation  (17),  then 


(222) 
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Now  Q  is  positive  definite  and  has  trace  one.  However,  N  —  m  =  A'(l  —  /d)  of  its  eigenvalues  are  very  close 
to  zero,  while  the  next  are  close  to  1/m.  Thus.  Q  is  close  to  a  rank  m  matrix.  Roughly  speaking,  this  means 
that  estimator  error  variance  can  be  decreased  by  the  factor  1/m  only,  even  though  the  actual  rank  of  Q  is 
much  larger. 

Toeplitz  quadratic  forms  are  natural  choices  in  view  of  the  fact  that  every  linear  functional  of  the 
power  spectrum  is  the  expected  value  of  a  Toeplitz  bilinear  form  in  the  infinite  data  sequence  (Section  4.0). 
It  has  been  argued  here  that  a  desirable  quadratic  form  for  estimating  baseband  power  would  have  the 
properties 

(  Q  =■  V’V'  is  Toeplitz 

<  rank((5)  =  m  =  A';?  (223) 

.  V'V'*  —  (orthogonal  filter  responses). 

Our  first  example,  equation  (211),  has  these  properties.  In  general,  if  V’’!'  is  Toeplitz,  then  the  estimator 
is  a  smoothed  or  averaged  periodogram  as  in  equation  (218).  If,  in  addition,  it  has  rank  m,  then  it  is  the 
average  of  m  samples  of  the  periodogram.  Finally,  it  can  be  shown  that,  if  V'*V'  is  diagonal  in  addition  to 
the  other  two  properties,  then  the  samples  of  the  periodogram  must  be  separated  by  integer  multiples  of 
2tr/A'.  For  the  problem  of  estimating  baseband  power,  these  samples  should  be  in  baseband  which  has  width 
2'xm/S .  Thus  our  example  is  essentially  uniqueVOf  course,  multiplication  on  the  left  by  a  unitary  matrix 

V"  ^UV 


will  produce  the  same  estimator.  If  Q  satisfies  the  conditions  of  equations  (223),  then  P  =  mQ  is  a  Toeplitz 
orthogonal  projection.  Thus 


(224) 


5(e^',x)=  |jKZ?(e->»)x||- 

=  —  l|PD(e--’'’)x||^ 

is  a  projection-based  estimator.  Finally,  every  symmetric  real  Toeplitz  matrix  Q  commutes  with  J .  This  is 
enough  to  make  Toeplitz  quadratic  estimators  y-symmetric  in  the  sense  of  Section  4.3,  since 

Q  =  JQJ 


(V'y)*(V'./)  =  v*v. 
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implies 


which  implies  in  turn  that 


VJ  =  UV 

for  seme  unitary  matrix  U . 

Non-Toeplitz  Forms.  Now  let  us  relax  the  requirement  that  V*V  be  Toeplitz.  We  require  the 
maximum  number  of  orthogonal  filters,  whose  passbands  are  in  the  baseband.  Two  orthogonal  filters  will 
give  independent  output  samples  in  the  Gaussian  white  data  case.  (Essentially,  the  power  spectrum  must 
be  nearly  constant  on  bands  of  normalized  bandwidth  H.)  Therefore  we  can  decrease  error  variance  by  the 
factor  1/m,  where  m  is  the  number  of  orthogonal  filters  we  can  squeeze  into  the  baseband.  What  is  required 
at  this  point  is  a  precise  statement  of  what  this  means. 

Let 

N-l 

^  (22.5) 

k=0 

be  the  frequency  response  function  of  an  FIR  filter.  A  minimum  requirement,  if  this  filter  is  to  be  a  baseband 
filter,  is  that,  for  some  choice  of  e  >  0, 

/  ^  >  (1  -  0  / (226) 

In  other  words,  most  of  the  filter  energy  should  be  in  baseband.  This  is  a  quadratic  inequality  and  can  be 
written 

h^Rh>  (1 -f)h^h,  (227) 

where 

h  =  [hohi  -hN.yf  (228) 

and  R  is  the  Toeplitz  baseband  autocorrelation  matrix  of  equation  (17). 

Each  row  of  V  corresponds  to  a  filter  response.  The  two  conditions  together  lead  to  the  following 
problem.  Find  the  largest  (meaning  m)  m  x  N  matrix  V  for  which 

l/[/Z- (1 >  0  (229) 

and 

VV*zz—l.  (230) 

m 
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The  eigenvalues  of  R  are 


Al  >  A5  >  ■ .  •  >  Ajv  >  0, 

and  the  eigenvalues  of  f?  —  (1  —  f)7  are 

Al-(I-f) . A-v-(l-0- 

But  we  know  the  sum  of  these  eigenvalues,  using  equation  (17),  to  be 

v 

Tr(fl)  =  A?  =  .V/?.  (231) 

t=i 

Thus,  for  f  sufficiently  small,  at  most  N3  of  the  eigenvalues  of  R  can  be  greater  than  1  —  f.  The  inequal¬ 
ity  (217)  cannot  hold  if  fewer  than  m  of  the  eigenvalues  of  /?  —  (1  —  t)I  are  nonnegative.  Therefore,  NJ  is 
an  upper  bound  on  the  rank  m  for  this  problem  (when  e  is  sufficiently  small). 

A  stronger  statement  of  this  situation  is  found  in  the  following. 

Theorem:  Let  M  be  the  mxm  matrix  on  the  left  of  inequality  (229),  and  let  M  have  eigenvalues 

PI  (232) 


Then  if  equation  (229)  holds, 


for  each  i. 


2  .  Af-(I-0 

"■s — ;;; — 


(233) 


The  proof  of  this  theorem  involves  a  generalized  Rayleigh  quotient  inequality  argument.  Of  course, 
M  is  nonnegative  definite  if  and  only  if  all  m  of  its  eigenvalues  are  nonnegative.  Equality  in  (233)  can 
be  obtained  by  letting  the  columns  of  V*  be  eigenvectors  of  R,  in  particular  those  having  the  m  largest 
eigenvalues.  This  choice  produces  the  Thomson  spectral  estimator,  and  J  is  symmetric.  (This  theorem 
addresses  the  problem  in  Section  2.5.) 

In  conclusion,  we  have  argued  that,  for  a  specified  resolution  0,  there  is  a  limit  to  the  reduction 
in  error  variance  possible  with  multiple  window  quadratic  estimators.  The  rank  m  should  be  approximately 
the  time  bandwidth  product  N 0. 


G.O  CONCLUSIONS 


The  basic  problem  in  classical  spectrum  analysis  is  to  project  a  time  series  onto  a  timelimited 
and  bandlimited  subspace  where  power  can  be  estimated.  Such  a  subspace  can  only  be  approximated,  so 
the  problem  can  be  rephrased  as  one  of  constructing  approximating  subspaces  and  projections  onto  them. 
The  first  obvious  approach  is  to  build  a  frequency  selective  FIR  filter,  and  the  natural  extension  of  this 
approach  is  to  build  a  frequency  selective  linear  transformation.  In  Section  2.0  of  this  chapter  we  have 
followed  this  approach  to  its  logical  conclusion  and  found  Slepian  sequences  as  the  appropriate  sequences  for 
building  a  subspace  that  is  timelimited  and  approximately  bandlimited.  In  Section  3.0  we  have  rephrased  the 
problem  of  spectrum  analysis  as  one  of  estimating  parameters  in  a  structured  covariance  matrix.  Maximum 
likelihood  estimates  of  these  parameters  produce  spectrum  estimators  which  are  essentially  equivalent  to 
the  multiwindow  spectrum  estimators  of  Thomson  and  to  rank  reduced  versions  of  Daniell's  frequency 
averaged  periodogram.  The  maximum  likelihood  spectrum  estimate:  is  a  quadratic  form  in  the  data  that 
is  formed  by  complex  demodulating  the  time  series,  projecting  it  onto  a  low-rank  subspace,  and  computing 
its  power  in  that  subspace.  The  mean-squared  error  of  the  estimator  decreases  inversely  with  the  rank  of 
the  quadratic  form  that  is  constructed  in  this  way.  In  Section  4.0  we  show  that  every  quadratic  estimator 
of  the  power  spectrum  that  is  required  to  be  nonnegative  and  modulation  invariant  must  be  a  quadratic 
form  in  a  complex  demodulated  time  series.  This  fundamental  representation  theorem  characterizes  a  cla.ss 
of  admissible  spectrum  estimators.  The  maximum  likelihood  estimators  are  members  of  this  claiss,  but  there 
are  others.  They  include  all  of  the  windowed  and  frequency  averaged  periodograms  and  others  discussed  in 
Section  .5.0. 

From  the  point  of  view  of  this  chapter,  there  is  nothing  very  fundamental  about  uniform  sampling, 
nor  is  there  anything  very  fundamental  about  a  scalar  index  parameter  for  the  data.  This  means  that 
our  results  extend  to  the  frequency-wavenumber  analysis  of  nonuniformly  sampled  space-time  series.  These 
extensions,  developed  somewhat  in  [10]  and  [11],  form  the  basis  for  a  research  program  in  the  spectrum 
analysis  of  mult iparameter  data  sequences. 
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